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Abstract 

Cephalosporins are of key clinical importance for treatment of bacterial infections. In the 
past decade, many cephalosporins have been synthesized and evaluated for antibacterial 
activity. These cephalosporins, with broad spectra of activity and high stability against 
various B-lactamases such as cefixime, cefteram pivoxil, and cefpodoxime proxetil, have 
been developed and introduced in clinical practice. Efforts to synthesize more compounds 
for better activity are still on. It is very important that the antibiotic has favorable 
pharmacokinetic properties [absorption, distribution, metabolism, excretion (ADME)]. 
Hence, predicting pharmacokinetic parameters, of a new molecule, in an early stage of drug 
design, is of as high importance as the activity of the compound. With rapid advances in 
computation power of machines and availability of experimental data, these ADME 
properties can now be better predicted by using suitable computational methods. In 
present study, an attempt has been made to derive quantitative relationships between 
structure of cephalosporins and one of the important pharmacokinetic property, serum 
plasma protein binding. 
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1. Introduction 


A cherished goal of chemists for generations 
has been to create molecules with specific 
properties. Finding new drugs, in particular, 
is an important part of the new initiatives in 
health care. However, it is an extremely 
challenging process due to the complexities 
involved [1]. Traditionally, a combination of 
serendipity and empiricism has been the 
basis of new drug discovery. Trial and error 
synthesis of compounds and their random 


screening for activity have proved to be 
both time-consuming and uneconomical. 
Further, therapeutic effects and hazards to 
health are assessed using a series of 
experimental and in-vivo tests. However, 
usage of animal models is often subject to 
ethical (and financial) considerations. 
Therefore, alternative methods have been 
under development to reduce the 
requirement of animals in testing [2]. 
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Structure-based design, spurred by 
significant pitfalls of the traditional methods 
and rapid advances in molecular structure 
determination and computational resources, 
were tested as a means of generating new 
pharmaceuticals [3, 4] and for predicting 
their properties prior to synthesis [5]. 

The structural formula of an organic 
compound, in principle, contains coded 
within it all the information which 
predetermines the chemical, biological, 
and physical properties of that compound. 
If we can understand how a molecular 
structure brings about a particular effect 
in a biological system, we have a key to 
unlocking the relationship and using that 
information to our advantage. Formal 
development of these relationships on this 
premise proved to be the foundation for 
the development of predictive models. If 
we took a series of chemicals and attempted 
to form a quantitative relationship between 
the biological effects (i.e. bioactivity) and 
the chemistry (i.e. structure) of each of the 
chemicals, then it would be possible to form 


a quantitative structure-activity 
relationship or QSAR [6, 7]. 
Quantitative structure-property 


relationships (QSPRs), are mathematical 
models that attempt to relate the structure- 
derived features of a compound to its 
biological or physicochemical activity. 
Similarly, quantitative structure-toxicity 
relationship (QSTR) or quantitative 
structure-pharmacokinetic relationship 
(QSPR) is used when the modeling applies 
on toxicological or pharmacokinetic 
systems. QSAR (also QSPR, QSTR, and 
QSPR) works on the assumption that 
structurally similar compounds have similar 
activities. Therefore, these methods have 
predictive and diagnostic abilities. They can 
be used to predict the biological activity 
(e.g, IC50) or class (e.g. inhibitor versus 
non-inhibitors) of compounds before the 


actual biological testing. They can also be 
used in the analysis of structural 
characteristics that can give rise to the 
properties of interest. 

The explosive development of computer 
technology and methodologies to calculate 
molecular properties increasingly made it 
possible to use computer techniques to 
aid the drug discovery process. The use of 
computer techniques in this context is 
often called computer-aided drug design 
(CADD), but since the development of 
drug involves a large number of steps in 
addition to the development of a high 
affinity ligand a more appropriate name 
computer-aided ligand design (CALD) has 
also been proposed [8]. 


2. Materials and Methods 


The present study was undertaken with an 
objective to establish quantitative-structure 
pharmacokinetic relationships (QSPR) of 
prognostic relevance in the -lactam 
(Cephalosporins) series of drugs. The 
reason to select B-lactam series of drugs 
was because such correlations are 
developed for very few drugs. Further, very 
few reports on QSPR were available for this 
series of drugs and that too involving only 
small sets of drugs and few descriptors. 
Thus, quantitative relationships between 
structural descriptors of cephalosporin 
molecules and serum protein binding (PB) 
were evaluated. 

The work was divided into following three 
phases: 

1. Computation of molecular descriptors 

2. Compilation of pharmacokinetic data 

3. Development of meaningful correlations 


Computation of molecular descriptors 

It is well known fact that the structure of 
drug molecules is expressed quantitatively 
in terms of its physicochemical descriptors, 
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which are lipophilic, electronic and steric in 
nature. The physicochemical descriptors 
govern the biological activity of the 
compounds. 

PUBCHEM database contains 2D and 3D 
minimized structures of large number of 
drugs and other molecules. 3D structures of 
39 cephalosporins selected for the study 


were downloaded from the database and 
used as such for correlation studies. Sample 
3D structure of one of the cephalosporin, 
Cefaclor, used in the study is given in figure 
1. Structures of 39 cephalosporins in molfile 
format were used as input for computation 
of descriptors. 


S No. Molecule 3 D Structure 
1 Cefaclor 
G a S 
22 g9 


O = Chlorine) si. SSCS 


*Source: PubChem Compound 


Figure 1. Sample 3D structures of one cephalosporin used in the study 


We used two software, namely, QikProp 
and CODESSA to calculate the descriptors. 
QikProp, an application in Maestro 
version 10.4.018 which in turn is a part of 
Schrödinger Suite release 2015-4, was 
used for this work. This suite of 
applications is used to predict physically 
significant descriptors and 
pharmaceutically relevant properties of 
organic molecules, either individually or 
in batches. In addition to predicting 
molecular properties, QikProp provides 
ranges for comparing a_ particular 
molecule’s properties with those of 95% 
of known drugs. 

CODESSA version 3.2.13 was used in our 
work. This software integrates all 
necessary mathematical and 
computational tools to calculate a large 


variety of molecular descriptors (up to 
400, depending on input files) on the basis 
of the 3D geometrical and/or quantum- 
chemical structural input of chemical 
compounds. Within the framework of the 
CODESSA program, a variety of statistical 
techniques are also available for 
structure-property correlation and for the 
analysis of the experimental data in 
combination with the calculated 
molecular descriptors. 

MOL files were used as input to the 
software by selecting the command 
Project>Import structures. All the 
molfiles were selected and imported into 
QikProp. Descriptors were calculated by 
using commands Application >QikProp 
and pressing run. QikProp calculates all 
the descriptors and creates a project 
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table. The data was imported into 
CODESSA by using command “Add 
CODESSA input file”, which is a standard 
CSV format containing all the requisite 
information like molfiles, descriptor 
values and property values. 

CODESSA calculates additional descriptors 
for each of cephalosporin. In this case, 
more than 200 descriptors were 
calculated using QikProp and CODESSA. 


Compilation of pharmacokinetic 
(serum protein binding) data 


The reported values of serum protein 
binding of cephalosporins in humans were 
taken from literature [9-15]. Most 
reviewers, while compiling pharmacokinetic 
data for a series of drugs, take the mean 
value as the value for the pharmacokinetic 
parameter. On similar lines, 
pharmacokinetic data for all the drugs were 
compiled and the arithmetic mean was 
taken for the correlation studies. The mean 
values of these values for all cephalosporins 
used in study are compiled in Table 1. 


Table 1. Plasma Protein Binding Values of selected Cephalosporins 


# | Drug P» (%) # Drug P» (%) 

1 Cefaclor 37.33 2 Cefotiam 40 
3 Cefadroxil 15 4 Cefoxitin 68.33 
5 Cefamandole 74.67 6 Cefpiramide 96 
7 Cefamandole nafate 75 8 Cefpodoxime 27.5 
9 Cefatrizine 59 10 Cefprozil 45 
11 Cefazaflur 65 12 Cefroxadine 10 
13 Cefazedone 95 14 Cefsulodin 30 
15 Cefazolin 83.67 16 Ceftazidime 16.33 
17 Cefdinir 65 18 Ceftezole 42.5 
19 Cefditoren 88 20 Ceftizoxime 108 
21 Cefepime 20 22 Ceftriaxone 89.67 
23 Cefetamet 30 24 Cefuroxime 36.67 
25 Cefixime 63 26 Cephacetrile 35 
27 Cefmenoxime 60 28 Cephalexin 13.33 
29 Cefmetazole 85 30 Cephaloglycin 25 
31 Cefonicid 98 32 Cephaloridine 20 
33 Cefoperazone 90 34 Cephalothin 66.67 
35 Ceforanide 81.67 36 Cephapirin 46.67 
37 Cefotaxime 37.67 38 Cephradine 11.67 
39 Cefotetan 90 


Development of meaningful correlations 
Only significant descriptors calculated by 
QikProp and CODESSA were taken in the 
correlation studies. Insignificant or 
intercorrelated descriptors were skipped. 


Correlation studies were carried out by 
CODESSA. 

Selection criteria and steps used for “Best 
Multilinear Regression” in CODESSA is 
shown as following: 
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Maximum number of descriptors, started 
from 1 and then taken up to depending on 
the number of molecules selected. Drug 
molecules: Descriptor ratio was taken as 
6:1, which implies that not more than one 
descriptor per 6 molecules in a series was 
used for developing correlations. For 
example, if there were 21 molecules for a 
particular property, maximum number of 
descriptors used for developing 
regression equations was kept at 3. 
Similarly for a series having 40 molecules, 
maximum number of descriptors was 6. 
Maximum number of correlations per 
number of descriptor were kept as 5 
Correlation improvement cut-off was kept 
as 0.01 
Maximum r? for orthogonal descriptor 
was kept as 0.5 
If missing property value, then the 
selection was made to skip structure 
“Best Multilinear Regression” routine tests a 
large number of correlations as each 
descriptor type is analyzed for correlations 
individually for the selected 
pharmacokinetic property. 


3. Results and Discussion 


Correlations for serum protein binding with 
structural descriptors are discussed below: 

Serum protein binding data was available 
for 39 cephalosporins, thus these 
cephalosporins were taken for the present 
study out of 45 selected cephalosporins. 
Thus, correlations were attempted keeping 
the number of maximum descriptors to 6 
thereby limiting the drug: descriptor ration 
to 6:1. LOO and y-scramble tests were also 
performed. The best correlations obtained 
with serum protein binding (PB) for 
cepahlosporins are given in below Table 2. 
The table lists equations starting from 1 
descriptor equation up to an equation with 


maximum number of descriptors that can 
be used as mentioned above. 

With the probability of reporting a large 
number of such correlations for each 
property, it was considered necessary to 
change the format of these correlations into 
an equation format. The validity of the 
equation and the relative importance of the 
different parameters used can be judged by 
four statistical criteria; namely coefficient of 
determination R2, Cross validated R? (Q2), 
Fisher’s F value, and R2 Rand which is the 
maximum R? obtained after randomizing 
the property values and finding correlations 
with descriptors again. The larger value of F 
indicates higher probability of QSPR 
equation being significant. These methods 
provide correlation coefficient (r), standard 
deviation (s), and ratio between variance of 
calculated and observed activates (F). 
Depending upon the values of these 
statistical parameters, the significance of 
each equation was evaluated. 


Goodness of correlations and types of 
descriptors involved 

Constitutional and electrostatic descriptors 
resulted in statistically significant 
correlations. Reasonably high values of R? 
and Q? were obtained (Equations. 1-6, Table 
2). Excellent correlations of serum protein 
binding were obtained in the series when 
more than 4 descriptors were used 
(Equations. 5-6, Table 2). The max. R? 
(~0.79) and Q? (~0.62) for equation 5 and 
max. R* (~0.82) and Q? (~0.54) for 
equation 6 indicate the importance of these 
descriptors in describing the serum protein 
binding of cephalosporins. It is notable that 
the R* RAND for all the equations is lesser 
than the Rĉ which indicates that the 
equations obtained are not chance 
correlations and hence can be used for 
prediction purposes. 
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As it would be too voluminous to give 
details of each of the equations obtained, 
details of only the best correlation for each 
property in a series are given. The 
correlation matrix of descriptors used in 
Equation 5 is given in the following Table 3. 
Correlation matrix for selected descriptors 
of Equation 5,. 


Table 2. Correlations of Protein Binding 


1351. ee eee Zefirov Charge for 


The correlation matrix indicates that none 
of the descriptors used in the correlation 
are orthogonal with the other descriptors. 
The MLR regression coefficients for 
individual descriptors used in Equation 5 
are given in Table 4. The plots of observed 
versus predicted serum protein binding 
values obtained given below in Figure 2. 


in the series of Cephalosporins 


T OC l 


aN Atom + ee eee 997 

0.318*NSASA-1, Zefirov 
179.149*Average Valence for a N 
Atom + 495.983 

0.373*NSASA-1, Zefirov 
194.677*Average Valence for a N 
Atom + 6653.126*Maximum Bond 
Length - 11620.449 
3405.017*Average Bond Length for a 
C-O Bond - 647.698*Net Zefirov 
Charge of All C Atoms - 
197.354*Uniform-Mass, Center of 
Mass, X - 3904.678 

Zefirov - 


176.289*Average Valence for a N 
Atom + 176.363*Average Valence - 


168.45*Center of Mass, Z + 
5379.612*Maximum Bond Length - 
9762.544 

0.329*NSASA-1, Zefirov - 
391.292*Average Valence for a N 
Atom - 10624.959*Maximum Bond 
Length for a H-N Bond - 
803.326*Average Bond Length for a 
C-C Bond + 171.46*Minimum Bond 
Length for a O Atom + 
12500.639*Maximum Bond Length 
for a C-C Bond - 6489.243 


i 0.295*NSASA-1, 


39 | 0.4495 | 0.4032 | 30.2105 


0.6123 | 0.5339 | 28.4325 | 0.4549 
0.7004 | 0.5009 | 27.2722 | 0.4546 


0.7426 | 0.5304 | 24.5221 | 0.3384 
0.7899 | 0.6164 | 24.8073 | 0.4789 


0.8228 | 0.5426 | 24.7673 | 0.4991 
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Table 3. Correlation matrix for selected descriptors of Equation 5,Table 2 


NSASA-1, Average Average Center of Mairim 
Zefirov Malone r Valence Mass, Z Bond 
a N Atom í Length 
NSASA-1, Zefirov 1.0000 
Average Valence for a N Atom 0.1238 1.0000 

Average Valence 0.5457 0.2618 1.0000 
Center of Mass, Z 0.0900 0.3093 0.4595 1.0000 

Maximum Bond Length -0.3589 0.0761 -0.0123 -0.0023 1.0000 


Table 4. MLR regression coefficients and t-values for PB in Cephalosporins 


Descriptor Name Coeff. T p(t) SE 
Intercept -9762.5440 -2.9327 0.006067 3328.8645 
NSASA-1, Zefirov 0.2952 5.9752 1.04E-06 0.0494 
Average Valence for a N Atom -176.2893 -5.2252 9.49E-06 33.7382 
Average Valence 176.3634 2.7857 8.78E-03 63.3095 
Center of Mass, Z -168.4497 -3.5180 0.001291 47.8828 
Maximum Bond Length 5379.6115 2.9175 0.006305 1843.8894 


Predicted PB 


(0 20 40 60 80 100 
Experimental PB 
Figure 2. Plot of experimental vs predicted 
PB 


In literature, serum protein binding of 
drugs has primarily been correlated to 
lipophilicity [16-22]. The relation of graph 
theoretical descriptors [23], connectivity 
indices and pKa [24], steric parameters 
[25], and electronic descriptors [22] have 
also been studied. Our findings are also 
similar as these reports. Hydrophobicity 
(HumanAbsorption) descriptors did not 


correlate well in cephalosporins, however, 
all other types of descriptors namely 
constitutional, topological and 
electrostatic were part of the final 
correlations. 


Conclusion 


Structure-pharmacokinetic relationships 
were established for Protein binding in 
cephalosporins. Excellent correlations of 
serum protein binding were obtained in 
the cephalosporin series when more than 
4 descriptors were used. High R? (~0.82) 
and Q? (~0.54) values indicate the 
importance of these descriptors in 
describing the serum protein binding of 
cephalosporins. The correlation matrix 
also indicated that none of the descriptors 
used in the correlation are orthogonal 
with the other descriptors. Also, lesser R? 
RAND values in comparison to R? 
indicates that the equations obtained are 
not chance correlations. 
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